Japanese vowel recognition based on structural representation of speech
نویسندگان
چکیده
Speech acoustics varies from speaker to speaker, microphone to microphone, room to room, line to line, etc. Physically speaking, every speech sample is distorted. Socially speaking, however, speech is the easiest communication media for humans. In order to cope with the inevitable distortions, speech engineers have built HMMs with speech data of hundreds or thousands of speakers and the models are called speaker-independent models. But they often need to be adapted to the input speaker or environment and this fact claims that the speaker-independent models are not really speaker-independent. Recently, a novel acoustic representation of speech was proposed, where dimensions of the above distortions can hardly be seen. It discards every acoustic substance of speech and captures only their interrelations to represent speech acoustics structurally. The new representation can be interpreted linguistically as physical implementation of structural phonology and also psychologically as speech Gestalt. In this paper, the first recognition experiment was carried out to investigate the performance of the new representation. The results showed that the new models trained from a single speaker with no normalization can outperform the conventional models trained from 4,130 speakers with CMN.
منابع مشابه
Recognition of continuous utterances of Japanese vowel sequences based on structural representation of speech
Non-linguistic features such as vocal tract shapes and acoustic devices are inevitably involved in speech. Recently, a new representation of speech without any dimensions indicating the non-linguistic features was proposed. It discards the absolute properties of speech events and captures only the interrelations among them. In this paper, recognition experiments of continuous utterances of Japa...
متن کاملAutomatic Recognition of Japanese Vowel Sequences Using Structural Representation of Speech
When we humans communicate with each other by means of speech, non-linguistic features are inevitably involved in every step of speech production, encoding, transmission, decoding, and hearing. Recently, a new acoustic representation of speech without any dimensions indicating the non-linguistic features was proposed. It captures only the interrelations among speech events and can be interprete...
متن کاملRecognition of Connected Japanese Vowel Utterances Using Random Discriminant Structure Analysis
Automatic speech recognition has to deal with the non-linguistic variations of speech signals. Many non-linguistic variations can be modeled as the transformations of features. The universal structure of speech [12], [13], proves to be invariant to the feature transformations, and thus provides a robust representation for speech recognition. One of the difficulties of using the structure repres...
متن کاملSyllable-based acoustic modeling for Japanese spontaneous speech recognition
We study on a syllable-based acoustic modeling method for Japanese spontaneous speech recognition. Traditionally, mora-based acoustic models have been adopted for Japanese read speech recognition systems. In this paper, syllable-based unit and mora-based unit are clearly distinguished in their definition, and syllables are shown to be more suitable as an acoustic model for Japanese spontaneous ...
متن کاملAutomatic recognition of Japanese vowel sequences in noise using structural representation of speech
Non-linguistic features such as vocal tract shapes and acoustic devices are inevitably involved in speech. Recently, a new representation of speech without any dimensions indicating the non-linguistic features was proposed. It discards the absolute properties of speech events and captures only the interrelations among them. In this paper, first, analysis experiments of the representation in noi...
متن کامل